gradient play
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- Asia > Middle East > Jordan (0.05)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > Canada > Alberta (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > Alberta (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- Asia > Middle East > Jordan (0.05)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > Canada > Alberta (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > Alberta (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Higher-Order Uncoupled Learning Dynamics and Nash Equilibrium
Toonsi, Sarah A., Shamma, Jeff S.
We study learnability of mixed-strategy Nash Equilibrium (NE) in general finite games using higher-order replicator dynamics as well as classes of higher-order uncoupled heterogeneous dynamics. In higher-order uncoupled learning dynamics, players have no access to utilities of opponents (uncoupled) but are allowed to use auxiliary states to further process information (higher-order). We establish a link between uncoupled learning and feedback stabilization with decentralized control. Using this association, we show that for any finite game with an isolated completely mixed-strategy NE, there exist higher-order uncoupled learning dynamics that lead (locally) to that NE. We further establish the lack of universality of learning dynamics by linking learning to the control theoretic concept of simultaneous stabilization. We construct two games such that any higher-order dynamics that learn the completely mixed-strategy NE of one of these games can never learn the completely mixed-strategy NE of the other. Next, motivated by imposing natural restrictions on allowable learning dynamics, we introduce the Asymptotic Best Response (ABR) property. Dynamics with the ABR property asymptotically learn a best response in environments that are asymptotically stationary. We show that the ABR property relates to an internal stability condition on higher-order learning dynamics. We provide conditions under which NE are compatible with the ABR property. Finally, we address learnability of mixed-strategy NE in the bandit setting using a bandit version of higher-order replicator dynamics.
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (3 more...)
Online Competitive Information Gathering for Partially Observable Trajectory Games
Krusniak, Mel, Xu, Hang, Palermo, Parker, Laine, Forrest
Game-theoretic agents must make plans that optimally gather information about their opponents. These problems are modeled by partially observable stochastic games (POSGs), but planning in fully continuous POSGs is intractable without heavy offline computation or assumptions on the order of belief maintained by each player. We formulate a finite history/horizon refinement of POSGs which admits competitive information gathering behavior in trajectory space, and through a series of approximations, we present an online method for computing rational trajectory plans in these games which leverages particle-based estimations of the joint state space and performs stochastic gradient play. We also provide the necessary adjustments required to deploy this method on individual agents. The method is tested in continuous pursuit-evasion and warehouse-pickup scenarios (alongside extensions to $N > 2$ players and to more complex environments with visual and physical obstacles), demonstrating evidence of active information gathering and outperforming passive competitors.
- North America > United States > Texas (0.04)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Information Technology > Game Theory (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)
Gradient play in stochastic games: stationary points, convergence, and sample complexity
Zhang, Runyu, Ren, Zhaolin, Li, Na
We study the performance of the gradient play algorithm for stochastic games (SGs), where each agent tries to maximize its own total discounted reward by making decisions independently based on current state information which is shared between agents. Policies are directly parameterized by the probability of choosing a certain action at a given state. We show that Nash equilibria (NEs) and first-order stationary policies are equivalent in this setting, and give a local convergence rate around strict NEs. Further, for a subclass of SGs called Markov potential games (which includes the setting with identical rewards as an important special case), we design a sample-based reinforcement learning algorithm and give a non-asymptotic global convergence rate analysis for both exact gradient play and our sample-based learning algorithm. Our result shows that the number of iterations to reach an $\epsilon$-NE scales linearly, instead of exponentially, with the number of agents. Local geometry and local stability are also considered, where we prove that strict NEs are local maxima of the total potential function and fully-mixed NEs are saddle points.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (5 more...)
On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games
Zhang, Runyu, Mei, Jincheng, Dai, Bo, Schuurmans, Dale, Li, Na
Softmax policy gradient is a popular algorithm for policy optimization in single-agent reinforcement learning, particularly since projection is not needed for each gradient update. However, in multi-agent systems, the lack of central coordination introduces significant additional difficulties in the convergence analysis. Even for a stochastic game with identical interest, there can be multiple Nash Equilibria (NEs), which disables proof techniques that rely on the existence of a unique global optimum. Moreover, the softmax parameterization introduces non-NE policies with zero gradient, making it difficult for gradient-based algorithms in seeking NEs. In this paper, we study the finite time convergence of decentralized softmax gradient play in a special form of game, Markov Potential Games (MPGs), which includes the identical interest game as a special case. We investigate both gradient play and natural gradient play, with and without $\log$-barrier regularization. The established convergence rates for the unregularized cases contain a trajectory-dependent constant that can be arbitrarily large, whereas the $\log$-barrier regularization overcomes this drawback, with the cost of slightly worse dependence on other factors such as the action set size. An empirical study on an identical interest matrix game confirms the theoretical findings.
- North America > Canada > Alberta (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)